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Abstract. A simple yet successful approach to parallel satisfiability 
(SAT) solving is to run several different (a portfolio of) SAT solvers on 
the input problem at the same time until one solver finds a solution. The 
SAT solvers in the portfolio can be instances of a single solver with differ¬ 
ent configuration settings. Additionally the solvers can exchange informa¬ 
tion usually in the form of clauses. In this paper we investigate whether 
this approach is applicable in the case of massively parallel SAT solving. 
Our solver is intended to run on clusters with thousands of processors, 
hence the name HordeSat. HordeSat is a fully distributed portfolio-based 
SAT solver with a modular design that allows it to use any SAT solver 
that implements a given interface. HordeSat has a decentralized design 
and features hierarchical parallelism with interleaved communication and 
search. We experimentally evaluated it using all the benchmark problems 
from the application tracks of the 2011 and 2014 International SAT Com¬ 
petitions. The experiments demonstrate that HordeSat is scalable up to 
hundreds or even thousands of processors achieving significant speedups 
especially for hard instances. 


1 Introduction 

Boolean satisfiability (SAT) is one of the most important problems of theoretical 
computer science with many practical applications in which SAT solvers are used 
in the background as high performance reasoning engines. These applications 
include automated planning and scheduling |21j . formal verification |22j . and 
automated theorem proving [10]. In the last decades the performance of state- 
of-the-art SAT solvers has increased dramatically thanks to the invention of 
advanced heuristics [IS] , preprocessing and inprocessing techniques [T5j and data 
structures that allow efficient implementation of search space pruning |25j . 

The next natural step in the development of SAT solvers was parallelization. 
A very common approach to designing a parallel SAT solver is to run several 
instances of a sequential SAT solver with different settings (or several different 
SAT solvers) on the same problem in parallel. If any of the solvers succeeds 
in finding a solution all the solvers are terminated. The solvers also exchange 
information mainly in the form of learned clauses. This approach is referred to 
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as portfolio-based parallel SAT solving and was first used in the SAT solver 
ManySat M- However, so far it was not clear whether this approach can scale 
to a large number of processors. 

Another approach is to run several search procedures in parallel and ensure 
that they work on disjoint regions of the search space. This explicit search space 
partitioning has been used mainly in solvers designed to run on large parallel 
systems such as clusters or grids of computers [9]. 

In this paper we describe HordeSat - a scalable portfolio-based SAT solver 
and evaluate it experimentally. Using efficient yet thrifty clause exchange and 
advanced diversification methods, we are able to keep the search spaces largely 
disjoint without explicitly splitting search spaces. Another important feature of 
HordeSat is its modular design, which allows it to be independent of any concrete 
search engines. HordeSat uses Sat solvers as black boxes communicating with 
them via a minimalistic interface. 

Experiments made using benchmarks from the application tracks of the 2011 
and 2014 Sat Competitions [3] show that HordeSat can outperform state-of-the- 
art parallel SAT solvers on multiprocessor machines and is scalable on computer 
clusters with thousands of processors. Indeed, we even observe superlinear aver¬ 
age speedup for difficult instances. 

2 Preliminaries 

A Boolean variable is a variable with two possible values True and False. By 
a literal of a Boolean variable x we mean either x ov x {positive or negative 
literal). A clause is a disjunction (OR) of literals. A conjunctive normal form 
(CNF) formula is a conjunction (AND) of clauses. A clause can be also inter¬ 
preted as a set of literals and a formula as a set of clauses. A truth assignment (f 
of a formula F assigns a truth value to its variables. The assignment (f) satisfies 
a positive (negative) literal if it assigns the value True (False) to its variable 
and (j) satisfies a clause if it satisfies any of its literals. Finally, (p satisfies a CNF 
formula if it satisfies all of its clauses. A formula F is said to be satisfiable if 
there is a truth assignment (p that satisfies F. Such an assignment is called a 
satisfying assignment. The satisfiability problem (SAT) is to find a satisfying 
assignment of a given CNF formula or determine that it is unsatisfiable. 


Conflict Driven Clause Learning. Most current complete state-of-the-art 
SAT solvers are based on the conflict-driven clause learning (CDCL) algorithm [33] 
In this paper we will use CDCL solvers only as black boxes and therefore we 
provide only a very coarse-grained description. For a detailed discussion of 
CDCL refer to |5]. In Figure we give a pseudo-code of CDCL. The algo¬ 
rithm performs a depth-first search of the space of partial truth assignments 
(assignDecisionLiteral, backtrack - unassigns variables) interleaved with 
search space pruning in the form of unit propagation (doUnitPropagation) and 
learning new clauses when the search reaches a conflict state (sinalyzeConf lict, 
addLearnedClause). If a conflict cannot be resolved by backtracking then the 
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while not all variables assigned do 
assign Decision Literal 
doU nit Propagation 
if conflict detected then 
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CDCL4 
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addLearnedClause 
backtrack or return UNSAT 


CDCL7 


return SAT 


Fig. 1. Pseudo-code of the conflict-driven clause learning (CDCL) algorithm. 

formula is unsatisfiable. If all the variables are assigned and no conflict is de¬ 
tected then the formula is satisfiable. 


3 Related Work 


In this section we give a brief description of previous parallel SAT solving ap¬ 
proaches. A much more detailed listing and description of existing parallel solvers 
can be found in recently published overview papers such as mm- 

Parallel CDCL — Pure Portfolios. The simplest approach is to run CDCL 
several times on the same problem in parallel with different parameter settings 
and exchanging learned clauses. If there is no explicit search space partitioning 
then this approach is referred to as the pure portfolio algorithm. The first parallel 
portfolio SAT solver was ManySat [T3]. The winner of the latest (2014) Sat 
Competition’s parallel track - Plingeling |3] is also of this kind. 

The motivation behind the portfolio approach is that the performance of 
CDCL is heavily influenced by a high number of different settings and parameters 
of the search such as the heuristic used to select a decision literal. Numerous 
heuristics can be used in this step |25) but none of them dominates all the other 
heuristics on each problem instance. Decision heuristics are only one of the many 
settings that strongly influence the performance of CDCL solvers. All of these 
settings can be considered when the diversihcation of the portfolio is performed. 
For an example see ManySat [14]. Automatic configuration of SAT solvers in 
order to ensure that the solvers in a portfolio are diverse is also studied [50] . 

Exchanging learned clauses grants an additional boost of performance. It is 
an important mechanism to reduce duplicate work, i.e., parallel searches working 
on the same part of the search space. A clause learned from a conflict by one 
CDCL instance distributed to all the other CDCL instances will prevent them 
from doing the same work again in the future. 

The problem related to clause sharing is to decide how many and which 
clauses should be exchanged. Exchanging all the learned clauses is infeasible es¬ 
pecially in the case of large-scale parallelism. A simple solution is to distribute 
all the clauses that satisfy some conditions. The conditions are usually related 
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to the length of the clauses and/or their glue value [I]. An interesting technique 
called “lazy clause exchange” was introduced in a recent paper [5]. We leave 
the adaptation of this technique to future work however, since it would make 
the design of our solver less modular. Most of the existing pure portfolio SAT 
solvers are designed to run on single multi-processor computers. An exception is 
CL-SDSAT [T^ which is designed for solving very difficult instances on loosely 
connected grid middleware. It is not clear and hard to quantify whether this ap¬ 
proach can yield significant speedups since the involved sequential computation 
times would be huge. 


Parallel CDCL Partitioning The Search Space Explicitly. The classi¬ 
cal approach to parallelizing SAT solving is to split the search space between the 
search engines such that no overlap is possible. This is usually done by starting 
each solver with a different fixed partial assignment. If a solver discovers that 
its partial assignment cannot be extended into a solution it receives a new as¬ 
signment. Numerous techniques have presented how to manage the search space 
splitting based on ideas such guiding paths [5], work stealing [5D], and generat¬ 
ing sufficiently many tasks m- Similarly to the portfolio approach the solvers 
exchange clauses. 

Most of the previous SAT solvers designed for computer clusters or grids use 
explicit search space partitioning. Examples of such solvers are GridSAT [9] , PM- 
SAT |TT], GradSat 0, G-sat [55], ZetaSat [5] and SatCiety [55]. Experimentally 
Gomparing HordeSat with those solvers is problematic, since these solvers are 
not easily available online or they are implemented for special environments using 
non-standard middleware. Nevertheless we can get some conclusions based on 
looking at the experimental sections of the related publications. 

Older grid solvers such as GradSat |8], PM-SAT [TT| SatCiety |28|, ZetaSat |6] 
and C-sat [55] are evaluated on only small clusters (up to 64 processors) using 
small sets of older benchmarks, which are easily solved by current state-of-the-art 
sequential solvers and therefore it is impossible to tell how well do they scale for 
a large number of processors and current benchmarks. The solver GridSAT is 
run on a large heterogeneous grid of computers containing hundreds of nodes for 
several days and is reported to solve several (at that time) unsolved problems. 
Nevertheless, most of those problems can now be solved by sequential solvers 
in a few minutes. Speedup results are not reported. A recent grid-based solving 
method called Part-Tree-Learn m is compared to Plingeling and is reported 
to solve less instances than Plingeling. This is despite the fact that in their 
comparison the number of processors available to Plingeling was slightly less m- 

To design a successful explicit partitioning parallel solver, complex load bal¬ 
ancing issues must be solved. Additionally, explicit partitioning clearly brings 
runtime and space overhead. If the main motivation of explicit partitioning is to 
ensure that the search-spaces explored by the solvers have no overlap, then we 
believe that the extra work does not pay off and frequent clause sharing is enough 
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to approximate the desired behavior]^ Moreover, in [18] the authors argue that 
plain partitioning approaches can increase the expected runtime compared to 
pure portfolio systems. They prove that under reasonable assumptions there is 
always a distribution that results in an increased expected runtime unless the 
process of constructing partitions is ideal. 


4 Design Decisions 

In this section we provide an overview of the high level design decisions made 
when designing our portfolio-based SAT solver HordeSat. 

Modular Design. Rather than committing to any particular SAT solver we 
design an interface that is universal and can be efficiently implemented by current 
state-of-the-art SAT solvers. This results in a more general implementation and 
the possibility to easily add new SAT solvers to our portfolio. 
Decentralization. All the nodes in our parallel system are equivalent. There 
is no leader or central node that manages the search or the communication. 
Decentralized design allows more scalability and also simplifies the algorithm. 
Overlapping Search and Communication. The search and the clause ex¬ 
change procedures run in different (hardware) threads in parallel. The system 
is implemented in a way that the search procedure never waits for any shared 
resources at the expense of losing some of the shared clauses. 

Hierarchical Parallelization. HordeSat is designed to run on clusters of com¬ 
puters (nodes) with multiple processor cores, i.e., we have two levels of paral¬ 
lelization. The first level uses the shared memory model to communicate between 
solvers running on the same node and the second level relies on message passing 
between the nodes of a cluster. 

The details and implementation of these points are discussed below. 

5 Black Box for Portfolios 

Our goal is to develop a general parallel portfolio solver based on existing state- 
of-the-art sequential CDCL solvers without committing to any particular solver. 
To achieve this we define a C-I--I- interface that is used to access the solvers in 
the portfolio. Therefore new SAT solvers can be easily added just by implement¬ 
ing this interface. By core solver we will mean a SAT solver implementing the 
interface. 

In this section we describe the essential methods of the interface. All the 
methods are required to be implemented in a thread safe way, i.e., safe execution 
by multiple threads at the same time must be guaranteed. First we start with 
the basic methods which allow us to solve formulas and interrupt the solver. 

^ According to our experiments only 2-6% of the clauses are learned simultaneously 
by different solvers in a pure portfolio, which is an indication that the overlap of 
search-spaces is relatively small. 
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void addClause(vector<int> clause): This method is used to load the initial 
formula that is to be solved. The clauses are represented as lists of literals which 
are represented as integers in the usual way. All the clauses must be considered 
by the solver at the next call of solve. 

SatResult solve(): This method starts the search for the solution of the for¬ 
mula specified by the addClause calls. The return value is one of the following 
SatResult = {SAT, UNSAT, UNKNOWN}. The result UNKNOWN is returned when 
the solver is interrupted by calling setSolverlnterrupt (). 
void setSolverlnterrupt (): Posts a request to the core solver instance to in¬ 
terrupt the search as soon as possible. If the method solve has been called, 
it will return UNKNOWN. Subsequent calls of solve on this instance must return 
UNKNOWN until the method unsetSolverlnterrupt is called, 
void unsetSolverInterrupt(): Removes the request to interrupt the search. 

Using these four methods, a simple portfolio can be built. When using sev¬ 
eral instances of the same deterministic SAT solver, some diversihcation can be 
achieved by adding the clauses in a different order to each solver. 

More options for diversihcation are made possible via the following two meth¬ 
ods. A good way of diversihcation is to set default phase values for the variables 
of the formula, i.e., truth values to be tried hrst. These are then used by the 
core solver when selecting decision literals. In general many solver settings can 
be changed to achieve diversihcation. Since these may be different for each core 
solver we dehne a general method for diversihcation which the core solver can 
implement in its own specihc way. 

void setPhase(int var, bool phase): This method is used to set a default 
phase of a variable. The solver is allowed to ignore these suggestions, 
void diversify(int rank, int size): This method tells the core solver to di¬ 
versify its settings. The specihcs of diversihcation are left to the solver. The 
provided parameters can be used by the solver to determine how many solvers 
are working on this problem (size) and which one of those is this solver (rank). 
A trivial implementation of this method could be to set the pseudo-random 
number generator seed of the core solver to rank. 

The hnal three methods of the interface deal with clause sharing. The solvers 
can produce and accept clauses. Not all the learned clauses are shared. It is 
expected that each core solver initially offers only a limited number of clauses 
which it considers most worthy of sharing. The solver should increase the number 
of exported clauses when the method increaseClauseProduction is called. This 
can be implemented by relaxing the constraints on the learned clauses selected 
for exporting. 

void addLearnedClause(vector<int> clause): This method is used to add 
learned clauses received from other solvers of the portfolio. The core solver can 
decide when and whether the clauses added using this method are actually con¬ 
sidered during the search. 

void setLearnedClauseCallback(LCCallback* callback): This method is 
used to set a callback class that will process the clauses shared by this solver. To 
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export a clause, the core solver will call the void write (vector<int> clause) 
method of the LCCallback class. Each clause exported by this method must be a 
logical consequence of the clauses added using addClause or addLearnedClause. 
void increaseClauseProduction(): Inform the solver that more learned clauses 
should be shared. This could mean for example that learned clauses of bigger 
size or higher glue value [1] will be shared. 

The interface is designed to closely match current CDCL SAT solvers, but 
any kind of SAT solver can be used. For example a local search SAT solver could 
implement the interface by ignoring the calls to the clause sharing methods. 

For our experiments we implemented the interface by writing binding code 
for MiniSat [29] and Lingeling |4]. In the latter case no modifications to the 
solver were required and the binding code only uses the incremental interface of 
Lingeling. As for MiniSat, the code has been slightly modified to support the 
three clause sharing methods. 

6 The Portfolio Algorithm 

In this section we describe the main algorithm used in HordeSat. As already 
mentioned in section we use two levels of parallelization. HordeSat can be 
viewed as a multithreaded program that communicates using messages with 
other instances of the same program. The communication is implemented using 
the Message Passing Interface (MPI) [Tl|. Each MPI process runs the same 
multithreaded program and takes care about the following tasks: 

— Start the core solvers using solve. Use one fresh thread for each core solver. 

— Read the formula and add its clauses to each core solver using addClause. 

— Ensure diversihcation of the core solvers with respect to the other processes. 

— Ensure that if one of the core solvers solves the problem all the other 
core solvers and processes are notified and stopped. This is done by us¬ 
ing setSolverlnterrupt for each core solver and sending a message to all 
the participating processes. 

— Collect the exported clauses from the core solvers, filter duplicates and send 
them to the other processes. Accept the exported clauses of the other pro¬ 
cesses, filter them and distribute them to the core solvers. 

The tasks of reading the input formula, diversihcation, and solver starting are 
performed once after the start of the process. The communication of ending and 
clause exchange is performed periodically in rounds until a solution is found. The 
main thread sleeps between these rounds for a given amount of time specihed as 
a parameter of the solver (usually around 1 second). The threads running the 
core solvers are working uninterrupted during the whole time of the search. 

6.1 Diversification 

Since we can only access the core solvers via the interface dehned above, our 
only tools for diversihcation are setting phases using the setPhase method and 
calling the solver specihc diversify method. 
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The setPhase method allows us to partition the search space in a semi¬ 
explicit fashion. An explicit search space splitting into disjoint subspaces is usu¬ 
ally done by imposing phase restrictions instead of just recommending them. 
The explicit approach is used in parallel solvers utilizing guiding paths [3] and 
dynamic work stealing |2[)) . 

We have implemented and tested the following diversification procedures 
based on literal phase recommendations. 

— Random. Each variable gets a phase recommendation for each core solver 
randomly. Note that this is different from selecting a random phase each 
time a decision is made for a variable in the CDCL procedure. 

— Sparse. Each variable gets a random phase recommendation on exactly one 
of the host solvers in the entire portfolio. For the other solvers no phase 
recommendation is made for the given variable. 

— Sparse Random. For each core solver each variable gets a random phase 
recommendation with a probability of {^solvers)~^, where ^solvers is the 
total number of core solvers in the portfolio. 

Each of these can be used in conjunction with the diversify method whose 
behavior is defined by the core solvers. As already mentioned we use Lingeling 
and MiniSat as core solvers. In case of MiniSat, we implemented the diversify 
method by only setting the random seed. For Lingeling we copied the diver¬ 
sification algorithm from Plingeling [4], which is the multi-threaded version of 
Lingeling based on the portfolio approach and the winner of the parallel ap¬ 
plication track of the 2014 SAT Competition [3]. In this algorithm 16 different 
parameters of Lingeling are used for diversification. 


6.2 Clause Sharing 

The clause sharing in our portfolio happens periodically in rounds. Each round a 
fixed sized (1500 integers in the implementation) message containing the literals 
of the shared clauses is exchanged by all the MPI processes in an all-to-all fashion. 
This is implemented by using the MPI_Allgather [T3] collective communication 
routine defined by the MPI standard. 

Each process prepares the message by collecting the learned clauses from 
its core solvers. The clauses are filtered to remove duplicates. The fixed sized 
message buffer is filled up with the clauses, shorter clauses are preferred. Clauses 
that did not fit are discarded. If the buffer is not hlled up to its full capacity 
then one of the core solvers of the process is requested to increase its clause 
production by calling the increaseClauseProduction method. 

The detection of duplicate clauses is implemented by using Bloom filters [7] . 
A Bloom filter is a space-efficient probabilistic set data structure that allows 
false-positive matches, which in our case means that some clauses might be 
considered to be duplicates even if they are not. The usage of Bloom filters 
requires a set of hash functions that map clauses to integers. We use the following 


hash function which ensures that permuting the literals of a clause does not 
change its hash value. 

I ■ primes[abs{i ■ i) mod |pnmes|] 
eec 

where i > 0 is a parameter we are free to choose, C is a clause, 0 denotes 
bitwise exclusive-or, and primes is an array of large prime numbers. Literals are 
interpreted as integers in the usual way, i.e., Xj as j and Xj as —j. 

Each MPI process maintains one Bloom filter for each of its core solvers 
X and an additional global one g. When a core solver x exports a learned clause 
(7, the following steps are taken. 

— Clause C is added to gx- 

~ If (7 ^ 5 , (7 is added to g as well as into a data structure e for export. 

— If several core solvers concurrently try to access e, only one will succeed and 
the new clauses of the other core solvers are ignored. This way, we avoid 
contention at the shared resource e and rather ignore some clauses. 

After the global exchange of learned clauses, the incoming clauses need to be 
filtered for duplicates and distributed to the core solvers. The first task is done 
by using the global Bloom filter g. For the second task we utilize the thread local 
filters gx to ensure that each of them receives only new clauses. 

All the Bloom filters are periodically reset, which allows the repeated sharing 
of clauses after some time. Our initial experiments showed that this approach is 
more beneficial than maintaining a strict “no duplicate clauses allowed”-policy. 

Overall, there are three reasons why a clause offered by a core solver can get 
discarded. One is that it was duplicate or wrongly considered to be duplicate due 
to the probabilistic nature of Bloom filters. Second is that another core solver 
was adding its clause to the data structure for global export at the same time. 
The last reason is that it did not fit into the fixed size message sent to the other 
MPI processes. Although important learned clauses might get lost, we believe 
that this relaxed approach is still beneficial since it allows a simpler and more 
efficient implementation of clause sharing. 

7 Experimental Evaluation 

To examine our portfolio-based parallel SAT solver HordeSat we did experiments 
with two kinds of benchmarks. We used the benchmark formulas from the ap¬ 
plication tracks of the 2011 and 2014 SAT Competitions [3] (545 instances) 
and randomly generated 3-SAT formulas (200 sat and 200 unsat instances). The 
random formulas have 250-440 variables and 4.25 times as many clauses, which 
corresponds to the phase transition of 3-SAT problems m- 

^ Originally we only used the 2014 instances. A reviewer suggested to try the 2011 
instances also, conjecturing that they would be harder to parallelize. Surprisingly, 
the opposite turned out to be true. 
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Fig. 2. The influence of diversification and clause sharing on the performance of Horde- 
Sat using Lingeling (16 processes with 1 thread each) on random 3-SAT problems. 


The experiments were run on a cluster allowing us to reserve up to 128 nodes. 
Each node has two octa-core Intel Xeon E5-2670 processors (Sandy Bridge) with 
2.6 GHz and 64 GB of main memory. Therefore each node has 16 cores and 
the total number of available cores is 2048. The nodes communicate using an 
InfiniBand 4X QDR Interconnect and use the SUSE Linux Enterprise Server 11 
(x86_64) (patch level 3) operating system. HordeSat was compiled using g-l-+ 
(SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973] with the “-03” flag. 

If not stated otherwise, we use the following parameters: The time of sleeping 
between clause sharing rounds is 1 second. The default diversification algorithm 
is the combination of “sparse random” and the native diversification of the core 
solver. In the current version two core solvers are supported - Lingeling and 
MiniSat. The default value is Lingeling which is used in all the experiments 
presented below. It is also possible to use a combination of Lingeling and MiniSat. 
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Using only Lingeling gives by far the best results on the used benchmarks. The 
time limit per instance is 1000 seconds for parallel solvers and 50 000 seconds for 
the sequential solver Lingeling. Detailed results of all the presented experiments 
as well as the source code of HordeSat and all the used benchmark problems can 
be found at http://baldur.iti.kit.edu/hordesat, 

7.1 Clause Sharing and Diversification 

We investigated the individual influence of clause sharing and diversification 
on the performance of our portfolio. In the case of application benchmarks we 
obtained the unsurprising result that both diversification and clause sharing are 
highly beneficial for satisfiable as well as unsatisfiable instances. However, for 
random 3-SAT problems the results are more interesting. 

By looking at the cactus plots in Figure]^ we can observe that clause sharing 
is essential for unsatisfiable instances while not significant and even slightly 
detrimental for satisfiable problems. On the other hand, diversification has only 
a small benefit for unsatisfiable instances. This observation is related to a more 
general question of intensification vs diversification in parallel SAT solving m- 

For the experiments presented in Figure we used sparse diversification 
combined with the diversify method, which in this case copies the behavior 
of Plingeling. ft is important to note that some diversification arises due to the 
non-deterministic nature of Lingeling, even when we do not invoke it explicitly 
by using the setPhase or diversify methods. 

7.2 Scaling on Application Benchmarks 

In parallel processing, one usually wants good scalability in the sense that the 
speedup over the best sequential algorithm goes up near linearly with the number 
of processors. Measuring scalability in a reliable and meaningful way is difficult 
for SAT solving since running times are highly nondeterministic. Hence, we need 
careful experiments on a large benchmark set chosen in an unbiased way. We 
therefore use the application benchmarks of the 2011 and 2014 Sat Competi¬ 
tions. Our sequential reference is Lingeling which won the most recent (2014) 
competition. We ran experiments using 1,2,4,... ,512 processes with four threads 
each, each cluster nodes runs 4 processes. The results are summarized in Figure]^ 
using cactus plots. We can observe that increased parallelism is always beneficial 
for the 2011 benchmarks. In the case of all the benchmarks the benefits beyond 
32 nodes are not apparent. 

From a cactus plot it is not easy to see whether the additional performance 
is a reasonable return on the invested hardware resources. Therefore Table [T] 
summarizes that information in several ways in order to quantify the overall 
scalability of HordeSat on the union of the 2011 and 2013 benchmarks. We 
compute speedups for all the instances solved by the parallel solver. For instances 
not solved by Lingeling within its time limit T = 50 000s we generously assume 
that it would solve them if given T -|- e seconds and use the runtime of T for 
speedup calculation. Column 4 gives the average of these values. We observe 


11 


1200 


1000 


800 


600 


400 


200 


1200 


1000 


800 


600 


400 


200 


Lingeling 

1x4x4 

2x4x4 

4x4x4 

8x4x4 

16x4x4 

32x4x4 

64x4x4 

128x4x4 



100 150 

Problems 


250 


Lingeling - 


1x4x4 


2x4x4 - 


4x4x4 - 


8x4x4 

1 / / 1 

16x4x4 

/ ) J \ 

32x4x4 - 

/ f / 1 

64x4x4 - 

i If] 

128x4x4 



50 100 150 200 250 300 350 400 450 500 

Problems 


Fig. 3. The impact of doubling the number of processors on the runtime and the 
number solved problems for the 2011 and the union of 2011 and 2013 application 
instances. The labels represent (#nodes)x(^processes/node)x(#threads/process). 


considerable superlinear speedups on average for all the configurations tried. 
However, this average is not a very robust measure since it is highly dependent 
on a few very large speedups that might be just luck. In Column 5 we show the 
total speedup, which is the sum of sequential runtimes divided by the sum of 
parallel runtimes and Column 6 contains the median speedup. 

Nevertheless, these figures treat HordeSat unfairly since most instances are 
actually too easy for investing a lot of hardware. Indeed, in parallel computing, 
it is usual to analyze the performance on many processors using weak scaling 
where one increases the amount of work involved in the considered instances 
proportionally to the number of processors. Therefore in columns 7-9 we re¬ 
strict ourselves to those instances where Lingeling needs at least lOp seconds 
where p is the number of core solvers used by HordeSat. The average speedup 
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109.34 

13.05 

2607 

352.16 

867.00 

216.69 

pling8 

372 

357 

44 

18.61 

3.11 

67 

19.20 

4.12 

4.77 

phngl6 

400 

377 

347 

24.83 

3.53 

586 

26.18 

5.89 

7.34 

1x8x1 

373 

358 

53 

19.57 

3.13 

81 

20.42 

4.36 

4.79 

1x16x1 

400 

376 

325 

27.78 

4.06 

548 

30.30 

6.98 

7.34 


Table 1. HordeSat configurations (#nodes)x(^processes/node)x(^threads/process) 
compared to Plingeling with a given number of threads. The second column is the 
number of instances solved by the parallel solvers, the third is the number of instances 
solved by both Lingeling and the parallel solver. The following six columns contain the 
average, total, and median speedups for either all the instances solved by the parallel 
solvers or only big instances (solved after 10(#threads) seconds by Lingeling). The last 
column contains the “count based speedup” values defined in Subsection |7.2[ 


gets considerably larger as well as the total speedup, especially for the large con¬ 
figurations. The median speedup also increases but remains slightly sublinear. 
Figure]^ shows the distribution of speedups for these instances. 

Another way to measure speedup robustly is to compare the times needed 
to solve a given number of instances. Let Ti (Tp) denote the per instance time 
limits of the sequential (parallel) solver (50 000s (1 000s) in our case). Let rii (up) 
denote the number of instances solved by the sequential (parallel) solver within 
time Ti (Tp). If ni > Up (ui < Up) let (Tp) denote the smallest time limit for 
the sequential (parallel) solver such that it solves Up (ni) instances within the 
time limit T{ (Tp). We define the count based speedup (CBS) as 


CBS = i ^ 

|T{/Tp otherwise . 

The CBS scales almost linearly up to 512 cores and stagnates afterward. 
We are not sure whether this indicates a scalability limit of HordeSat or rather 
reflects a lack of sufficiently difficult instances - in our collection, there are only 
65 eligible instances. 
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Fig. 4. Distribution of speedups on the “big instances” - the data corresponding to 
Columns 7-9 of Table [H 



Problems 


Fig. 5. Comparison of HordeSat and Plingeling with Lingeling on the 2011 and 2014 
Sat Competition benchmarks. 


7.3 Comparison with Plingeling 

The most similar parallel SAT solver to our portfolio is the state-of-the-art solver 
Plingeling [4]. Plingeling is the winner of the parallel track of the 2014 SAT 
Competition. Both solvers are portfolio-based, they are using Lingeling and even 
some diversification code is shared. The main differences are in the clause sharing 
algorithms and that Plingeling does not run on clusters only single computers. 
For this reason we can compare the two solvers only on a single node. The 
results of this comparison on the benchmark problems of the 2011 and 2014 SAT 
Competition are displayed in Figure Speedup values are given in Table [l] 


14 











Both solvers significantly outperform Lingeling. The performance of Horde- 
Sat and Plingeling is almost indistinguishable when running with 8 cores, while 
on 16 cores HordeSat gets slightly ahead of Plingeling. 

8 Conclusion 

HordeSat has the potential to reduce solution times of difficult yet solvable 
SAT instances from hours to minutes using hundreds of cores on commodity 
clusters. This may open up new interactive applications of SAT solving. We find 
it surprising that this was achieved using a relatively simple, portfolio based 
approach that is independent of the underlying core solver. In particular, this 
makes it likely that HordeSat can track future progress of sequential SAT solvers. 

The Sat solver that works best with HordeSat for application benchmarks 
is Lingeling. Plingeling is another parallel portfolio solver based on Lingeling 
and it is also the winner of the most recent (2014) Sat Competition. Comparing 
the performance of HordeSat and Plingeling reveals that HordeSat is almost 
indistinguishable when running with 8 cores and slightly outperforms Plingeling 
when running with 16 cores. This demonstrates that there is still room for the 
improvement of shared memory based parallel portfolio solvers. 

Our experiments on a cluster with up to 2048 processor cores show that 
HordeSat is scalable in highly parallel environments. We observed superlinear 
and nearly linear scaling in several measures such as average, total, and median 
speedups, particularly on hard instances. In each case increasing the number of 
available cores resulted in significantly reduced runtimes. 

8.1 Future Work 

An important next step is to work on the scalability of HordeSat for 1024 cores 
and beyond. This will certainly involve more adaptive clause exchange strategies. 
Even for single node configurations, low level performance improvements when 
using modern machines with dozens of cores seem possible. We also would like 
to investigate what benefits can be gained by having a tighter integration of 
core solvers by extending the interface. Including other kinds of (not necessarily 
CDCL - based) core solvers might also bring improvements. 

When considering massively parallel SAT solving we probably have to move 
to even more difficult instances to make that meaningful. When this also means 
larger instances, memory consumption may be an issue when running many 
instances of a SAT solver on a many-core machine. Here it might be interesting 
to explore opportunities for sharing data structures for multiple SAT solvers or 
to decompose problems into smaller subproblems by recognizing their structure. 


Acknowledgment. We would like to thank Armin Biere for fruitful discussion 
about the usage of the Lingeling API in a parallel setting. 
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